Concept Learning for Cross-Domain Text Classification: A General Probabilistic Framework

نویسندگان

Fuzhen Zhuang

Ping Luo

Peifeng Yin

Qing He

Zhongzhi Shi

چکیده

Cross-domain learning targets at leveraging the knowledge from source domains to train accurate models for the test data from target domains with different but related data distributions. To tackle the challenge of data distribution difference in terms of raw features, previous works proposed to mine high-level concepts (e.g., word clusters) across data domains, which shows to be more appropriate for classification. However, all these works assume that the same set of concepts are shared in the source and target domains in spite that some distinct concepts may exist only in one of the data domains. Thus, we need a general framework, which can incorporate both shared and distinct concepts, for cross-domain classification. To this end, we develop a probabilistic model, by which both the shared and distinct concepts can be learned by the EM process which optimizes the data likelihood. To validate the effectiveness of this model we intentionally construct the classification tasks where the distinct concepts exist in the data domains. The systematic experiments demonstrate the superiority of our model over all compared baselines, especially on those much more challenging tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Monolingual and Cross-Lingual Probabilistic Topic Models and Their Applications in Information Retrieval

Probabilistic topic models are a group of unsupervised generative machine learning models that can be effectively trained on large text collections. They model document content as a two-step generation process, i.e., documents are observed as mixtures of latent topics, while topics are probability distributions over vocabulary words. Recently, a significant research effort has been invested int...

متن کامل

A COMMON FRAMEWORK FOR LATTICE-VALUED, PROBABILISTIC AND APPROACH UNIFORM (CONVERGENCE) SPACES

We develop a general framework for various lattice-valued, probabilistic and approach uniform convergence spaces. To this end, we use the concept of $s$-stratified $LM$-filter, where $L$ and $M$ are suitable frames. A stratified $LMN$-uniform convergence tower is then a family of structures indexed by a quantale $N$. For different choices of $L,M$ and $N$ we obtain the lattice-valued, probabili...

متن کامل

Incremental Clustering for the Classification of Concept-Drifting Data Streams

Concept drift is a common phenomenon in streaming data environments and constitutes an interesting challenge for researchers in the machine learning and data mining community. This paper proposes a probabilistic representation model for data stream classification and investigates the use of incremental clustering algorithms in order to identify and adapt to concept drift. An experimental study ...

متن کامل

Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning

Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...

متن کامل

Learning to Adapt Across Multimedia Domains

In multimedia, machine learning techniques are often applied to build models to map low-level feature vectors into semantic labels. As data such as images and videos come from a variety of domains (e.g., genres, sources) with different distributions, there is a benefit of adapting models trained from one domain to other domains in terms of improving performance and reducing computational and hu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Concept Learning for Cross-Domain Text Classification: A General Probabilistic Framework

نویسندگان

چکیده

منابع مشابه

Monolingual and Cross-Lingual Probabilistic Topic Models and Their Applications in Information Retrieval

A COMMON FRAMEWORK FOR LATTICE-VALUED, PROBABILISTIC AND APPROACH UNIFORM (CONVERGENCE) SPACES

Incremental Clustering for the Classification of Concept-Drifting Data Streams

Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning

Learning to Adapt Across Multimedia Domains

عنوان ژورنال:

اشتراک گذاری